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The enzyme thymidylate kinase phosphorylates the substrate thymidine 
5'-phosphate (dTMP) to form thymidine 5'-diphosphate (dTDP), which is 
further phosphorylated to dTTP for incorporation into DNA. Ehrlichia 
chaffeensis is the etiologic agent of human monocytotropic erlichiosis (HME), 
a potentially life-threatening tick-borne infection. HME is endemic in the 
United States from the southern states up to the eastern seaboard. HME is 
transmitted to humans via the lone star tick Amblyomma americanum. Here, the 
2.15 A resolution crystal structure of thymidylate kinase from E. chaffeensis in 
the apo form is presented. 



1 . Introduction 

Thymidylate kinase (TMPK) phosphorylates the substrate thymidine 
5'-phosphate (dTMP) to form thymidine 5'-diphosphate (dTDP). The 
overall reaction is as follows: 

ATP-Mg 2+ + dTMP^-+ADP-Mg 2+ + dTDP. 

The newly formed dTDP is subsequently phosphorylated to dTTP 
by nucleoside-diphosphate kinase for incorporation into DNA. The 
essentiality of dTTP for DNA synthesis makes TMPK a desirable 
drug target (Kandeel et at, 2009). There are ~60 thymidylate kinase 
structures from 19 species currently deposited in the Protein Data 
Bank (PDB). The first of these protein structures was solved from 
herpes simplex virus type I (Wild et at, 1997). 

Ehrlichia chaffeensis is an obligate intracellular Gram-negative 
coccus bacterium. E. chaffeensis is the etiologic agent of a zoonotic 
infection occurring in a deer-tick cycle and is spread via the lone star 
tick Amblyomma americanum to the white-tailed deer Odocoileus 
virginianus and occasionally to humans. The lone star tick is primarily 
found in the southern and southeastern United States. E. chaffeensis 
is the causative agent of human monocytotropic ehrlichiosis (HME). 

HME was first identified in 1987. Between its discovery and 2005 
there were a total of 2396 reported cases of HME, with 471 occurring 
in 2005 and a trend of increasing infections from 2001 to 2005. HME 
can present as a mild asymptomatic infection. The most common 
symptoms, found in over 50% of patients, include fever, headache, 
malaise, myalgia and nausea (Dumler et at, 2007). Current treatment 
for HME consists of the antimicrobial doxycycline, or rifampicin 
when doxycycline cannot be used owing to adverse reactions. Like 
many infectious diseases, there is a desire to develop better targeted 
drugs to treat HME. The mission of the Seattle Center for Structural 
Genomics (SSGCID) is to provide a blueprint for a structure-guided 
drug-design efforts. 




2. Methods 

2.1. Protein expression and purification 

The gene encoding thymidylate kinase was amplified via PCR in a 
96-well format using genomic DNA as a template. We used the ligase- 
independent cloning (LIC) technique (Aslanidis & de Jong, 1990). 
The primers are designed with an additional LIC sequence at the 5' 
ends that is complementary to the LIC sequence in the plasmid vector 
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(Mehlin et al., 2006; Choi et al., 2011). Purified PCR products were 
again cloned via LIC into the AVA0421 expression vector (Quartley 
et al, 2009), which provides a cleavable hexahistidine tag at the 
N-terminus of the expressed protein with the sequence MAHHH- 
HHHMGTLEAQTQ'GPGS (Choi et al, 2011). The recombinant 
plasmids were then transformed into Escherichia coli Rosetta Oxford 
strain [BL21*(DE3)-R3-pRARE2] cells for expression testing. The 
University of Washington Protein Production Group (UW-PPG) 
utilizes recombinant human rhinovirus 3C protease MBP fusion 
(His-MBP-3C protease) to cleave the hexahistadine tag (Bryan et al, 
2011). When the tag is cleaved the short GPGS sequence is left on 
the N-terminus of the full-length thymidylate kinase recombinant 
protein. The gene was assigned the SSGCID clone name 
EhchA.01616.a and will further be referred to as EhchA.01616.a/ 
TMPK. 

The transformed cells were tested for expression of soluble protein 
in a high-throughput screen and were then moved on to large-scale 
expression (Choi et al, 2011). Starter cultures of LB broth with 
appropriate antibiotics were grown for ~18 h at 310 K. ZYP-5052 
auto-induction medium was freshly prepared as per UW-PPG stan- 
dard protocols (Choi et al, 2011; Studier, 2005). The bottles were 
inoculated with all of the overnight culture. Inoculated bottles were 
then placed into a LEX bioreactor (Harbinger, Ontario, Canada). 
Cultures were grown for ~24 h at 298 K; the temperature was then 
reduced to 288 K and the culture was grown for a further ~60 h. To 
harvest, the culture was centrifuged at 4000g for 20 min at 277 K. Cell 
paste was flash-frozen in liquid nitrogen and stored at 193 K. 

Frozen recombinant cells were resuspended in a lysis buffer con- 
sisting of 25 mM HEPES pH 7.0, 500 mM NaCl, 5% glycerol, 0.5% 
CHAPS, 30 mM imidazole, 10 mM MgCl 2 , 1 mM tris(2-carboxy- 
ethyl)phosphine (TCEP), 250 ug ml -1 4-(2-aminoethyl)benzene- 
sulfonyl fluoride hydrochloride (AEBSF) and 0.025% sodium azide. 
The cells were ruptured via sonication, which was followed by incu- 
bation with benzonase nuclease (Invitrogen, Carlsbad, California, 
USA). The crude lysate was centrifuged at 31 500g and 277 K for 
75 min and the supernatant was loaded onto a Nickel HisTrap FF 
5 ml column (GE Healthcare, Piscataway, New Jersey, USA) for 
immobilized metal-affinity chromatography (IMAC). The column 
was washed with 20 column volumes of wash buffer (25 mM HEPES 
pH 7.0, 500 mM NaCl, 5% glycerol, 30 mM imidazole, 1 mM TCEP 
and 0.025% sodium azide). The bound protein was eluted with seven 
column volumes of elution buffer (25 mM HEPES pH 7.0, 500 mM 
NaCl, 5% glycerol, 1 mM TCEP, 250 mM imidazole and 0.025% 
sodium azide). Protein precipitation was observed in the elution 
fractions. An additional centrifugation step at 4000g was imple- 
mented to remove any insoluble aggregates that had formed. 1 mM 
adenosine diphosphate (ADP) and 1 mM MgCl 2 were added to the 
soluble protein in an attempt to prevent further aggregation. 

Cleavage of the N-terminal His tag was accomplished by overnight 
277 K dialysis with His-MBP-3C protease in buffer consisting of 
25 mM HEPES pH 7.5, 500 mM NaCl, 5% glycerol, 1 mM TCEP, 
0.025% sodium azide, 1 mM ADP and 1 mM MgCl 2 . The cleaved 
protein was recovered in both the flowthrough and wash fractions of a 
second Ni 2+ -affinity chromatography step that also removed the His- 
MBP-3C protease, uncleaved protein and cleaved His tag. This IMAC 
clarification step utilized the same buffers as the initial IMAC puri- 
fication. After affinity-tag cleavage, a tag remnant GPGS was left on 
the N-terminus of the full-length EhchA.01616.a/TMPK. Centrifu- 
gation at 43 OOOg for 30 min was performed to remove any precipi- 
tated protein that had formed during the cleavage/dialysis step. The 
soluble cleaved protein was further polished using a HiLoad 26/60 
Superdex 75 prep-grade column (GE Healthcare) equilibrated with 



Table 1 

Data-collection statistics. 



Values in parentheses are for the highest of 20 resolution shells. 



Wavelength (A) 


0.9774 


Space group 


P2i2i2 n 


Unit-cell parameters (A) 


a = 39.17, b = 144.82, c = 146.84 


Resolution range (A) 


50-2.15 (2.21-2.15) 


Unique reflections 


46708 (3410) 


Multiplicity 


5.9 (5.8) 


Completeness (%) 


100 (99.3) 


^merge 1* 


0.086 (0.529) 


Mean Ila(I) 


16.8 (3.7) 



t JW = T„ Et \m - W>)>l/£* ThW- 



25 mM HEPES pH 7.0, 500 mM NaCl, 5% glycerol, 2 mM dithio- 
threitol (DTT), 0.025% sodium azide, 1 mM ADP and 1 mM MgCl 2 . 
SDS-PAGE analysis was used to determine which fractions to pool. 
The purified protein was concentrated to 24 mg ml -1 and stored at 
193 K. 

2.2. Crystallization 

Thawed protein was used to set up four sparse-matrix screens, 
JCSG+ (Emerald BioStructures, Bainbridge Island, Washington, 
USA), Crystal Screen and Index HT (Hampton Research, Aliso 
Viejo, California, USA) and PACT (Molecular Dimensions, 
Newmarket, Suffolk, UK), following an extended Newman strategy 
(Newman et al, 2005). 0.4 ul protein solution was then mixed with 
0.4 ul well solution and equilibrated against a 100 ul reservoir using 
96- well Compact Jr crystallization plates (Emerald BioSystems). 
Crystals suitable for diffraction studies were found in condition G8 
from the PACT screen: 100 mM Bis-Tris propane pH 7.5, 200 mM 
sodium sulfate, 20% PEG 3350. The crystals were cryoprotected with 
an additional 25% ethylene glycol. 

2.3. Data collection and structure determination 

A diffraction data set was collected on 2 December 2009 on ALS 
beamline 5.0.1 at the Berkeley Center for Structural Biology in the 
context of the Collaborative Crystallography program using a 3 x 3 
tiled ADSC Q315r detector. 150 images were collected with a 
ip-slicing of 1° per image. The diffraction data were reduced in space 
group P2i2i2i to 2.15 A resolution with XDSIXSCALE (Kabsch, 
2010; Table 1). 

The packing density (Matthews, 1968) suggested four molecules 
per asymmetric unit, with a V M of 2.24 A 3 Da" 1 and 45% solvent 
content. A search of the PDB for sequence homology yielded 
thymidylate kinase from Aquifex aeolicus (PDB entry 2pbr; J. Jeya- 
kanthan, S. P. Kanaujia, C. Vasuki Ranjani, K. Sekar, N. Nakagawa, 
A. Ebihara, S. Kuramitsu, A. Shinkai, Y. Shiro & S. Yokoyama, 
unpublished work) as the closest sequence homolog, with 45% 
sequence identity. Molecular replacement was performed with the 
CCP4 (Winn et al, 2011) program Phaser (McCoy et al, 2007) using 
data between 20 and 3.5 A resolution. The initial search model was 
modified with the CCPA program CHAINSAW (Stein, 2008) based 
on sequence alignment with 2pbr. However, a search with the 
modified monomer A from 2pbr was not successful. A further trun- 
cation of the C-terminal residues 137-197 yielded convincing solu- 
tions for four monomers. Phases were improved with the CCPA 
program Parrot (Cowtan, 2010) including NCS averaging. The CCPA 
program Buccaneer (Cowtan, 2006) was then used to extend the 
initial model; the improved phases from Parrot were included during 
this process. 658 residues were built in 12 separate chains. The i? work 
of 0.385 and R [ree of 0.429 indicated a rather incomplete model. The 
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Table 2 

Refinement and model statistics. 



Values in parentheses are for the highest of 20 resolution shells. 



Resolution range (A) 


50-2.15 (2.21-2.15) 


^cryslt 


0.189 (0.205) 


Afreet 


0.232 (0.268) 


R.m.s.d. bonds (A) 


0.014 


R.m.s.d. angles (°) 


1.34 


Protein atoms 


5575 


Nonprotein atoms 


329 


Wilson B factor (A 2 ) 
Mean B factor (A 2 ) 


2ft. 1 


31.5 


Residues in favored region 


656 [94%] 


Residues in allowed region 


22 [3.2%] 


Residues in disallowed region 


10 [1.5%] 


MolProbily score [percentile] 


1.61 [96th] 


PDB code 


31d9 



t -Rcry„i = Y,m |l f otal - \Fa\c\\lY.hki \ F &*\- The £ree R facIor was calculated using an 
equivalent equation with the 5% of the reflections that were omitted from the 
refinement. 



model from Buccaneer was then used for model extension in ARPI 
wARP (Langer et ah, 2008), which built 633 residues in 18 chains with 
significantly improved R factors: i? work = 0.228 and R tree = 0.324. The 
model was then iteratively extended manually using Coot (Emsley et 
at, 2010) followed by cycles of reciprocal-space refinement with the 
CCPA program REFMAC5 (Murshudov et at, 2011). The final model 
could be refined with one TLS group per chain to an i? work of 0.187 
and an R STee of 0.232 with good stereochemistry (Table 2). The model 
was validated with the validation tools in Coot and MolProbity (Chen 
et al., 2010). The final model extends from residue Pro— 2 to Gln200 
for chains A and C and from Pro— 2 to Met201 for chains B and D. In 
each chain residues 135-150 are too disordered to be modeled and 
there is a varying amount of disorder in the four chains between 
residues 178 and 189. There are two sets of Ramachandran outliers in 
this structure: Arg93 and Phe94 from each chain are located in a loop 
between a /i-strand and an cv-helix. The electron density for these two 
residues is well defined. The second set is the peptide bond between 
Pro— 2 and Gly— 1, which are part of the purification tag. The four 
chains almost superimpose and show good electron density; however, 
this peptide bond lies in the allowed Ramachandran region for two 
chains and in the disallowed region for the other two chains. One 
sulfate molecule from the precipitant could be located in each chain 
and some ethylene glycol from the cryoprotectant could be placed. 



3. Results and discussion 

3.1. Overall EhchA.01 61 6.a/TMPK structure 

Full-length EhchA.01616.a/TMPK could be purified with crystal- 
lizable quality. The full-length protein with the affinity-tag remnant 
sequence GPGS at the N-terminus crystallized rather readily and a 
2.15 A resolution data set was collected on ALS beamline 5.0.1 
without further optimization of crystallization conditions. Despite 
high sequence identity (45%) to PDB entry 2pbr, molecular 
replacement was not straightforward. A significant Gterminal trun- 
cation was required for the search model to yield a solution. In 
hindsight, this could be explained by a larger structural difference 
between EhchA.01616.a/TMPK and 2pbr at the C-terminus 
compared with the N-terminus. A significant peak in a native 
Patterson map (20% height of the origin peak) indicated a pseudo- 
translational symmetry, which tends to complicate molecular- 
replacement searches. 

The model of EhchA.01616.a/TMPK consists of four monomers 
per asymmetric unit. Interface analysis with PISA (Krissinel & 



Henrick, 1997) supports the presence of two separate dimers (AB and 
CD). The buried surface area was ~1025 A 2 per monomer compared 
with a surface area of ~9000 A 2 per monomer and the free binding 
energy was estimated as AG 11 " = —84 kj mol -1 . The largest crystal- 
packing interface has a buried surface area of ~600 A 2 and can only 
be found once in the crystal lattice. Dimers are typically observed for 
thymidylate kinases and the dimers of EhchA.01616.a/TMPK have 
the same quaternary structure as other TMPKs deposited in the PDB. 
Hence, we are confident that the dimer seen twice in this structure is 
the native dimer of EhchA.01616.a/TMPK. 

The fold seen for EhchA.01616.a/TMPK is as expected for TMPKs: 
a central five-stranded /3-sheet is sandwiched between two cv-helices 
on one side and five a-helices on the other (Fig. 1). The four chains 
of EhchA.01616.a/TMPK are quite similar and superimpose with 
r.m.s.d.s of ~0.4-0.5 A for C™ atoms. An SSM search of the PDB for 
structural homologs reveals apo thymidylate kinase from A. aeolicus 
(PDB entry 2pbr) as the closest homolog, with r.m.s.d.s of around 
1.1 A and some distinct deviations of the C-termini. The second 
closest homolog is the structure of ligand-bound thymidylate kinase 
from Thermotoga maritima (PDB entry 3hjn; S. Yoshikawa, N. 
Nakagawa, M. Shirouzu, S. Yokoyama & S. Kuramitsu, unpublished 
work), with r.m.s.d.s in the range 1.3-1.4 A. 

A sulfate ion could be located in each of the monomers of 
EhchA.01616.a/TMPK. We assume that the sulfate ion was provided 
by the crystallization buffer, which contained 200 mM sodium sulfate. 
The structure of TMPK from A. aeolicus shows a sulfate ion in the 
same location (Fig. 2a). This protein was crystallized in the presence 
of 50 mM lithium sulfate. The structure of TMPK from T. maritima 
(PDB entry 3hjn) was crystallized in complex with adenosine 
5'-diphosphate (ADP) and thymidine 5'-diphosphate. The /S-phos- 
phate group of ADP in 3hjn superimposes with the sulfate in the 
other two structures. The nucleotide-binding pocket is structurally 
conserved between the ADP-bound T. maritima structure and the 
apo E. chaffeensis structure. Nucleotide binding would only require 
subtle structural changes that mostly involve side chains. As the 
binding pocket is accessible and is not blocked by the crystal lattice, 
it is likely that EhchA.01616.a/TMPK crystals will be soakable with 
nucleotides. 

3.2. Comparison to human TMPK 

EhchA.01616.a/TMPK has only 25% amino-acid sequence identity 
to the human TMPK protein. When compared with human TMPK 
bound to ADP, TMP and Mg 2+ (PDB entry le2f; Ostermann et al, 
2000) there are a few observed structural differences. Most notable 




Figure 1 

Dimer of EhchA. 01616. a/TMPK formed by monomers A and B. The ribbons are 
colored by secondary structure. Two sulfate ions are shown as yellow/red sticks. 
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are the structural differences near the C-terminus. There is a loop 
found in the EhchA.01616.a/TMPK structure that is not observed in 
the human protein or PDB entries 2pbr or 3hjn. This loop appears to 
be a result of a flve-amino-acid insertion from Tyrl89 to Aspl93. In 
the apo structure this loop is in close proximity to the ATP-binding 
site, with the loop oriented away from the binding site. It is unknown 
whether there are any conformational changes of the loop on 
nucleotide binding for the EhchA.01616.a/TMPK protein. It is also 




(b) 

Figure 2 

Superposition of EhchA.01616.a/TMPK monomer A with (a) thymidylate kinase 
from A. aeolicus (2pbr) and (b) thymidylate kinase from T. maritima (3hjn). In each 
figure the EhchA.01616.a structure is shown in the same colours as in Fig. 1, while 
the ribbons for TMPK from A. aeolicus and T. maritima are shown in light gray. 
Ligands for each structure are shown as coloured stick models. The sulfate ions in 
EhchA. 01616. a/TMPK and A. aeolicus TMPK superimpose. They also superimpose 
with a phosphate of ADP in the T. maritima structure. 



unknown whether this loop has any biological significance or whether 
this unique structural feature can be exploited for targeted drug 
development. 

The P-loop nucleoside-binding motif (GX 4 GKS/T) found in many 
nucleotide-binding proteins is present in both the human and Ehrli- 
chia TMPKs (Saraste et al., 1994). Specifically, the P-loop amino-acid 
sequences of the human and Ehrlichia proteins are GVDRAGKS and 
GIDGSGKT, respectively. These motifs both contain an acidic Asp 
residue that is uniquely found in TMPKs compared with other 
nucleoside monophosphate kinases (Lavie et al., 1998). The human 
enzyme is a type I TMPK, in which the Aspl5 residue is immediately 
followed by a catalytically important arginine residue. The Ehrlichia 
protein is instead a type II TMPK, with the Asp9 residue being 
followed by a glycine residue (Lavie et al., 1998). The P loops of the 
human and Ehrlichia enzymes have no major structural differences. 
The P loop is one of three regions known to undergo conformational 
changes on substrate binding (Ostermann et al., 2000). Substrate- 
bound structures of EhchA.01616.a/TMPK would be needed in order 
to understand the conformational changes of the P-loop in compar- 
ison to those of the human protein. Given the difference in the 
catalytic importance of the P loop between type I and type II TMPKs, 
it may be possible to exploit this difference for drug design. 

The flexible LID region also undergoes conformational changes 
and has catalytic differences between type I and type II TMPKs; the 
LID region closes upon ATP binding (Ostermann et al., 2000). The 
LID region remains unmodeled in the apo EhchA.01616. a/TMPK 
structure. As for the P loop, substrate-bound structures would be 
needed to fully compare the EhchA.01616.a/TMPK and human 
TMPK LID regions. There is no evidence that the overall structure 
of the LID region of EhchA.01616.a/TMPK would be significantly 
different from that of the human protein. However, there are 
significant amino-acid differences between Ehrlichia, human and 
other type II TMPKs. The catalytic arginine found in the P loop of 
type I TMPKs is found in the LID region of type II TMPKs. Typically, 
type II TMPKs have several basic residues in the LID region; for 
example, E. coli TMPK contains five basic residues in the region as 
opposed to three in the human protein (Lavie et al, 1998). The basic 
residues of the E. coli protein consist of Lysl48, Argl49, Argl51, 
Argl53 and Argl58, with Argl53 assuming the catalytic role of Argl6 
in the P loop of the human TMPK. The Ehrlichia protein only 
contains two basic residues in the LID region, Argl41 and Lysl44, 
with Argl41 presumed to be the catalytic residue. Since we do not 
currently have substrate-bound EhchA.01616. a/TMPK structures to 
fully compare with the human protein, it is difficult to determine the 
ability to target the protein with a novel drug based on structural 
differences alone. Based on both the catalytic differences of the P 
loop and LID region and amino-acid sequence differences, there is a 
possibility of specifically targeting EhchA.01616. a/TMPK over the 
human homologue. 



4. Conclusion 

This paper describes a purification strategy that results in 
EhchA.01616.a/TMPK of crystallizable quality. The resulting 2.15 A 
resolution crystal structure contained two dimers. While the fold is 
conserved within the TMPK family, significant changes are seen at the 
C-terminus which also have an impact on the molecular-replacement 
strategy. It is unknown whether there are biological implications of 
the difference in the C-terminus compared with other TMPKs. A 
sulfate ion from the crystallant occupies the /i-phosphate position of 
the ADP observed in homologous structures. Furthermore, substrate- 
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bound structures of EhchA.01616.a/TMPK would be beneficial to 
fully analyze the structural differences between the Ehrlichia and 
human proteins. At the time of publication, only nine structures of 
proteins from E. chaffeensis have been deposited in the PDB. 

The authors wish to thank all of the members of the SSGCID 
team. This research was funded under Federal Contract No. 
HHSN272200700057C from the National Institute of Allergy and 
Infectious Diseases, the National Institutes of Health, Department of 
Health and Human Services. 
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